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Abstract — This paper studies the effect of parametric mismatch 
in minimum mean square error (MMSE) estimation. In par- 
ticular, we consider the problem of estimating the input signal 
from the output of an additive white Gaussian channel whose 
gain is fixed, but unknown. The input distribution is known, 
and the estimation process consists of two algorithms. First, a 
channel estimator blindly estimates the channel gain using past 
observations. Second, a mismatched MMSE estimator, optimized 
for the estimated channel gain, estimates the input signal. We 
analyze the regret, i.e., the additional mean square error, that is 
raised in this process. We derive upper-bounds on both absolute 
and relative regrets. Bounds are expressed in terms of the 
Fisher information. We also study regret for unbiased, efficient 
channel estimators, and derive a simple trade-off between Fisher 
information and relative regret. This trade-off shows that the 
product of a certain function of relative regret and Fisher 
information equals the signal-to-noise ratio, independent of the 
input distribution. The trade-off relation implies that higher 
Fisher information results to smaller expected relative regret. 



I. Introduction 

Consider an application that you are given the output of a 
system, and you seek to recover the input of the system. You 
know that the system is noisy, e.g., it adds white Gaussian 
noise to the output. You know the distribution of the input, 
but you do not know the system parameters. Problems of this 
sort arise in different applications in signal processing and 
communication systems. Some examples include blind decon- 
volution [1], dereverberation [2], denoising [3], and mismatch 
decoding [4]. These applications differ in their fundamental 
models, fidelity criteria, and methodologies. However, they 
have one thing in common: they all suffer from parametric 
mismatch in recovering the input signals. 

The motivation of this work is blind deconvolution and 
dereverberation applications. Linear time-invariant channels 
serve as common models in these applications. As the input 
signal passes through these channels, it convolves with the 
unknown finite-impulse response (FIR) of the channel, and it 
adds with additive white Gaussian noise (of known variance). 
Recovering the input signals from the noisy output could be 
impossible even with perfect knowledge about the channel 
response. This is out of our scope. Instead, we aim to study 
the penalty and performance degradation that is specifically 
caused by the lack of knowledge about the channel response. 

We benchmark performance against that of perfect channel 
knowledge scenario. We are concerned about issues such as 
required sample complexity or training in channel estimation 



to bring performance of input estimation within a desired 
range. As a counterpart problem in communication systems, 
one may think of block fading channels and the trade-off 
between accuracy of channel estimation and performance of 
decoding [5]. Note that channel estimation in our case is blind 
as we have no control of the source. 

As a first step to address these problems, in this work, 
we focus on the most basic system in which the unknown 
channel is just a single gain. We expect that the results and 
intuitions of this work will shed lights on the analysis of 
generic FIR channels. 1 In treating the problem, we consider 
an estimation process that consists of two algorithms. First, a 
channel estimator blindly estimates the channel gain using past 
output observations. Second, a mismatched minimum mean 
square error (MMSE) estimator, optimized for the estimated 
channel gain, estimates the input signal. Figure 1 illustrates 
the building blocks of this process. Due to estimation error 
in channel estimation, the MMSE estimator that is used in 
recovering the input signal results in a mean square error that 
is larger than that of the ideal MMSE estimator. We call this 
additional error as regret, and we derive novel upper-bounds on 
both absolute and relative regrets. The bounds are simple and 
demonstrate interesting connections to the Fisher information. 
To this end, one might attempt to exploit the results of [6] and 
[7] to derive other alternative bounds. 

We also quantify regret for unbiased, efficient channel 
estimators. Since these estimators achieve Cramer-Rao bound, 
they result in a simple trade-off relation between Fisher in- 
formation and relative regret. This trade-off relation expresses 
that the product of a certain function of relative regret and 
Fisher information is equivalent to the signal-to-noise ratio, 
independent of the input distribution. Trade-off suggests that 
higher Fisher information results to smaller expected relative 
regret. Although, intuitively, this may seem expected, simplic- 
ity of the trade-off relation makes it worthwhile. 



II. Setup 

Consider a linear dynamic system 

Y n = aX n + V n 



(1) 



in which {V n } is an independent, identically, distributed (i.i.d.) 
Gaussian noise such that V n ~ jV(0, <t^). The input X n is 

1 Analogous to the case between the analysis of flat-fading and the analysis 
of frequency-selective channels. 



an i.i.d. process whose distribution is known to be P(X). 
Parameter a e M + is a fixed, unknown channel gain. It 
results to a derived parametric family of probability measures 
P a (X,Y), the joint distribution of X and Y, governing the 
system dynamic (1). The objective is to observe a realization 
of the output process 



Y r ' 



{Y U Y 2 ,--- ,Y n ) 



and estimate the realization of the underlying input process, 
i.e., 

X n = (Xi,X 2 , ■ ■ ■ , x n ). 

Let X = R and y = M. denote the input and output spaces, 
respectively. We consider memoryless input estimators, e.g., 
<f>: y — »■ X where 4>{Y n ) is an estimate for X n . The mean 
square error (MSE) for is defined 

E [(X - 0(r)) 2 ] = J(x - <t>{y)fdP a . (2) 

In Eq. (2) and henceforth we follow the convention that un- 
subscribed expectations are measured according to P a (X, Y). 
Moreover, we use concise notations like P a = P a (X,Y) 
and P a \ y = P a (X\Y = y) to denote joint and conditional 
distributions, respectively. 

One seeks to find an estimator that minimizes MSE (2). The 
main challenge, however, is that a and P a are unknown. If we 
had oracle knowledge about a, the MMSE estimator for X is 
defined 



My) = ®[X\Y = y\. 



(3) 



for an observation Y = y. Any other estimator <f) results 
to additional error that we call it regret. The motivation for 
this name is that it measures degradation on performance, an 
impact caused by imprecise knowledge about a. 

In this paper, we assess regret for a special class of 
mismatched estimators. Namely, we consider an estimation 
process that is depicted in Figure 1. A channel estimation 
works in parallel with an MMSE input estimation as follows. 
At time instance n, a channel estimator finds an estimate 
a = a n of a using the observed values Y n ~ x . Then, it uses 
the optimal estimator of Pa{X, Y) to compute 



<fc(2M) = E a [X n \Y = y n ] 



(4) 



as an estimate for X n . Function <j>a is a mismatch MMSE 
estimator that causes regret when used in place of <p a . In 
the following sections, we study two types of regret: absolute 
regret and relative regret. 

III. Absolute Regret 

A. Deviation Analysis 

The absolute regret corresponding to </> a is 

R(a, a) = E [(X - ^(Y)) 2 ] — E [(X - MY)) 2 ] . (5) 

Application of orthogonality principle results to 

R(a,a) = E[(4>a(Y)-4> a (Y)) 2 ] ■ (6) 
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Fig. 1. Figure depicts the building blocks of the system setup. The estima- 
tion process consists of two individual algorithms: 1) a channel estimation 
algorithm that blindly estimates a as an estimate for a, 2) a mismatch MMSE 
estimation algorithm, optimized for a, that recovers -A-n- 



Eq. (6) quantifies the absolute regret of using </>a instead of 
4> a ■ The following lemma states and proves an upper-bound 
on (6). 

Lemma 3.1: For every a, the following holds true 

Y 2 \ 1 
■ -)j(X;a\\Y) 



6< + 8- 



R(a,a) < (a- a) 2 E 

+ o(a - a) 2 (7) 
in which the expectation is with respect to Y, and 

J(X;a\\Y) ^E[(W\nf a (X\Y)) 2 \Y] (8) 

is the Fisher information of X relative to a, conditioned on 
Y. Here, f a (X\Y) is the density of P a \ Y . 

Proof: Refer to Appendix A. ■ 
Lemma 3.1 describes a bound (7) that comprises two multi- 
plicative terms. The first term (a — a) 2 measures the channel 
estimation error. The second term is the weighted average of 
conditional Fisher information. Intuitively, this term measures 
the amount of information that an observable random variable 
X carries about unknown parameter a conditioned on Y, 
assigning more weight to larger values of Y. 

Corollary 3.1: If \a - a\ « 1, and if J(X;a\\Y) and Y 2 
are uncorrected, we obtain the simple bound 

2 

R(a, a) < (o - a) 2 (14a 2 + S^)J(X; a\Y) (9) 



J(X;a\Y)±E (V In f a (X\Y)) (10) 



in which 

(Vln/ (X|F))^ 

is the average of J(X;a\\Y) with respect to Y? 

B. Efficient Channel Estimation 

Neither Eq. (7) nor Eq. (9) depend on the channel estimation 
algorithm that estimates a. They simply relate small deviation 
between a and a to absolute regret in estimating X. To 
incorporate the effect of channel estimation algorithm, we 
proceed as follows. 

As mentioned earlier, at time n, a is obtained through 
observation of Y n ~ x = (ii)"^ 1 . In formal terms, 



a : 



MY" 



2 Lookout for the subtle notational difference between J(X;a\\Y), a 
random variable, and J(X;a\Y), a scalar. 



where A = (A\ , A 2 , ■ ■ ■ ) is a channel estimation algorithm in 
which A n : y n - 1 -> M+. 

Lemma 3.2: Let A denote the class of all unbiased channel 
estimation algorithms. If A contains an efficient estimator [8, 
p. 92], the following holds true 

inf E (y"" 1 ), a)] < 



AeA 



E 



(6o*+8g) J(X;a\\Y) 



n-l J{Y;a) 

for sufficiently large values of n. 3 
Proof: Refer to Appendix B. 

IV. Relative Regret 
A. Deviation Analysis 
Let 

(MY) -MY)) 2 



(11) 



RR(a, a) = E 



E a [X^\Y]+E a [X*\Y] 



(12) 



denote the relative regret. The following lemma states and 
proves a simple upper-bound on RR(a, a). 
Lemma 4.1: For every a, we have 

RR(a, a) <(a - a) 2 a\Y) + o{a - a) 2 (13) 

where J(X;a\Y) is defined as Eq. (10) and denotes the 
conditional Fisher information of X relative to a. 

Proof: See Appendix C. ■ 
Eq. (13) results to a simple upper-bound on the relative regret 
for small deviations between a and a. 

B. Efficient Channel Estimation 

Similar to the case for absolute regret, we now state the 
following result. 

Lemma 4.2: Let A denote the class of all unbiased esti- 
mation algorithms. If A contains an efficient estimator, the 
following holds true 

1 J(X;a\Y) 



M E [RR(A n (Y n ~ 1 ), a) 



< 



(14) 



1 J{Y-a) 

for sufficiently large values of n. 

Proof: The proof of this lemma is essentially the same 
as the proof of Lemma 3.2. ■ 
Lemma 4.2 describes a bound on the expected relative regret, 
should an efficient estimator be used. This bound determines 
the smallest upper-bound on average relative regret, when 
sufficiently good unbiased channel estimators are used. 

C. Regret Scalar 

The constant value in the RHS of Eq. (14) worths attention. 
It does not change with respect to n, and as n — > oo, it 
becomes the sole scalar that determines the level of relative 
regret. We define this quantity as the regret scalar and denote 
it by 

J(X-a\Y) 



J(Y;a) 

3 The expectation in the LHS is with respect to Y n ~ 1 . 



(15) 



Lemma 4.3: For every zero-mean input distribution P(X), 
the following trade-off holds true between regret scalar and 
output Fisher information 



(p(a) + l)J(r;a) = -f. 



(16) 



Proof: See Appendix D. ■ 
In Eq. (16), the RHS is the signal-to-noise ratio that is 
independent of a. Thus, Eq. (16) presents a simple product 
trade-off relationship between p(a) and J(Y; a). It suggest 
that the higher the Fisher information, the smaller the regret 
scalar, and vice-versa. The following example explicates this 
trade-off. 

Example 4.1 (Gaussian Input): Assume X n ~ A/"(0,cr 2 ) 
and V„ — 7V(0, ct 2 ) are i.i.d. implying that Y n — 7V(0, a 2 a 2 x + 
a 2 ) and Y n \x n ~ Af(ax n ,a 2 ). With perfect knowledge of a, 
the ideal estimator for X given Y = y is 

T 2 



My) 



acrz 



a 2 al + cr 



-v- 



2 2 



The MMSE error resulting from this estimator is 

E[(X-MY)) 2 }- 
A mismatch estimator for a is 



h(y) - 



o 2 ct 2 + a l v 



y- 



We have 



and 



Thus, 



J(Y;a\X) = 



J(Y;a) = 



2a z a: 



2^4 



(a 2 a 2 +a 2 y 



p(a) = 



1 (a 2 a 2 



(17) 



(18) 



(19) 



(20) 



(21) 



(22) 



Figure 2 depicts the behavior of p(a) and JCY; a) with respect 

CT 2 

to a. The SNR = -§■ = 10 dB and at a = .35, the minimum 
regret scalar coincides with maximum Fisher information. 

V. Recap and Conclusion 

We considered the problem of estimating the input signal 
from the output of an additive white Gaussian noise channel 
subject to parametric uncertainty. Namely, the channel gain is 
fixed, but unknown. In treating the problem, we considered 
an estimation process that consists of two algorithms: a blind 
channel estimator and a mismatched MMSE estimator to 
estimate the input. We studied the regret that is raised as a 
result of mismatch estimation. Simple upper-bounds on both 
absolute and relative regrets were presented. These bounds 
provide useful tools in assessing deviation in estimating the 
input when there exists a small deviation in channel gain 
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Fig. 2. Figure illustrates the multiplicative trade-off between regret scalar 
and Fisher information. Smaller Fisher information results to larger regret 
scalar and vice versa. The SNR = 10 dB and the minimum regret scalar is 
coincident with maximum Fisher information. 



estimation. The bounds are simple and expressed in terms of 
the Fisher information. This makes them more intuitive and 
could potentially bridge to other known results in the literature. 

We also quantified regret for unbiased, efficient channel 
estimators. Using Caramer-Rao bound, we derived a simple 
trade-off between Fisher information and relative regret. This 
trade-off expresses that the product of a certain function of 
relative regret and the Fisher information is equivalent to the 
signal-to-noise ratio, independent of the input distribution. The 
trade-off suggests that the higher the Fisher information, the 
smaller the expected relative regret. 

This work is our initial attempt to shed light on information- 
theoretic limits of blind deconvolution and dereverberation 
systems. We are currently working on generalization of these 
results to these applications. 



Appendix 

A. Proof of Lemma 3.1 

To derive an upperbound on absolute regret, we first state 
and prove the following results. 

Proposition A.l: For every a and y E y, we have 

(My)-My)?< 

2(E a [X 2 \y] +E a [X 2 \y] )D(P & \ v \\P a \ v ). (23) 



Proof: By definition, we have 



(My) - Mv)Y 



,dP; 



<i\y 



dp, 



dQ dQ 



)dQ 



for every probability measure Q such that P a t y <C Q and 
Pa\y "C Q- By Cauchy Schwartz inequality, we obtain 

2 



h(y)-Mv)f < / x 




dQ 



dQ (24) 



By inequality (a + b) 2 < 2(a 2 + b 2 ), one can show that the 
first term in the RHS of the above inequality is smaller than 
or equal to 

2(E a [X 2 \y] +Ea [X 2 \y]). 

The second term in the RHS of inequality (24) is known as 
Kakutani-Hellinger distance between P a (X\y) and P&(X\y), 
denoted by [9, p. 363] 

2 



(Pa\yi Pa\% 




dQ. 



Moreover, we know of the following inequality between 
Kakutani-Hellinger distance and Kullback-Leibler distance [9, 
p. 369] 

2r 2 (P &ly ,P aly )<D(P iily \\P aly ). 
Substituting in (24), we obtain Eq. (23). ■ 

Proposition A.l: For every a and y £ y, the following 
inequality holds true 

(25) 



E Q [X 2 \y] <^l+A V -. 



Proof: Let f a (y\x) and f a (y) denote the conditional and 
marginal densities for P a (X,Y). Then, 



E a [X 2 \y] = 



2 fa(y\x) ,, v , 

X „ ; ; j(x)dx 



fa(y) 



' x:fa(y\x)<f a {y) J x:f a (y\x)>f a (y) 

< E [X 2 ] + I (26) 

Jx:f a {y\x)>f a (y) 

To simplify the second term, we substitute x 2 by the inequality 
that is derived as follows 

f a (y\x) >f a (y) => 

(y - ax) 2 < -2a 2 v In (V2na v f a (yj\ . 
Taking the square roots, we obtain 



\y-ax\ < \ ~2a 2 \n[V2Tra v fa(y) 



\ax\ < 



\y\ + ^-2a2 In (V2^(T v f a (y) 



Taking the square of both sides of the previous inequality and 
using the inequality (a + b) 2 < 2 (a 2 + b 2 ), we obtain 



As a result, we obtain 



,y 2 A a v f (y~ ax ) 2 



x 2 < 2^- +4-^ 



x 2 <d- + 2cr 2 x 
a 2 



2a 2 



f(x)dx 



By substituting for x 2 in the second term of the RHS of Eq. 
(26), we conclude Eq. (25). ■ 
As a result of Propositions A. 1 and A. 2, we obtain 

{My)-Mv)) 2 < 

2 2 

2(6^+4^ +4^)D(P &ly \\P aly ). (27) 

Moreover, the following equality is known between Kullback- 
Leibler distance and Fisher information [10, p.55] 

D(P aly \\P aly )= { ^£j(X;a\\Y = y) 

+ o(a-a) 2 , (28) 

where J(X;a\\Y = y)=E f(V ln f a (X\Y)) 2 \Y = y\ is the 
Fisher information of X relative to a, conditioned on V = y. 
Substitute Eq. (28) in Eq. (27) and note that 

1 1 N2 

= + o(a - a) 

for << 1. Taking the expectation with respect to Y, we 

conclude the proof of Lemma 3.1. 



B. Proof of Lemma 3.2 

We know that a = A n {Y n ~ l ). For an unbiased estimator 
and for sufficiently large values of n, \A n (Y n ^ 1 ) — a\ « 1 
and 



R(A n (Y n - 1 ),a)<(A n (Y n - 1 )-a) 2 



E 



6^ + 8^1 J(X;a\\Y) 



(29) 



holds true with arbitrarily high probability. Taking the expec- 
tation of both sides of Eq. (29) with respect to F n_1 , we 
obtain 

E^A^Y"- 1 )^)] < 
^[(A^Y"- 1 ) - a) 2 ] E 



6^ + 8^1 J(X;a\\Y) 



Take the infimum of both sides over A and assume A contains 
an efficient estimator [8, p. 92]. By definition an efficient 
estimator achieves the Cramer-Rao bound. This means 

1 



^[{A^Y"- 1 ) - a) 2 } = 



JiY"- 1 -^)' 

Since Y n is i.i.d., by additivity of Fisher information 

JiY^-a) = (n-l)J(Y;a). 



m^E^^or 1 - 1 )^)] < 



1 



E 



6^ + 8^ )j(X;a\\Y) 



n-l 

C. Proof of Lemma 4.1 

By Proposition A.l, we have 

(My) -My)) 2 



!a [X 2 \y]+E a [X< 



J{Y-a) 



< 2£)(P a |„||P |. 



Substituting from Eq. (28) and taking the average with respect 
to Y, we obtain 

RR(a,a) < (a - a) 2 E [(V In f a {X\Y)) 2 ] +o(a-a) 2 , 

and conclude the proof. 

D. Proof of Lemma 4.3 

Since X does not depend on a, J(X; a) = 0, and hence 

J(X;a\Y) J(Y;a\X) 



p(a) = 



1. 



J(Y;a) J{Y-a) 
Moreover, since the additive noise is Gaussian, the equality 

J(Y;a\X) = 4 

holds true for every distribution P(X) with zero mean. As a 
result, we obtain Eq. (16). 
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